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©*\ (57) Abstract: A method and arrangement (300) are disclosed for detecting the presence, appearance or disappearance of subtitles 



ON 

c5 



in a video signal. A very high reliability can be achieved, and a marginal processing power is needed, due to the fact that most 
computations are already done by circuitry of an MPEG encoder (101-113) or decoder. A subtitle is detected if the complexity of 
the image area in which subtitles are displayed substantially exceeds the complexity of at least one other image area. Examples of 
properties representing the complexity are (i) the products of bit cost (b) and quantizer scale (qs) in MPEG slices, (ii) the location 
of the center of gravity of the spectral DCT coefficients (c), (iii) the number of macroblocks in the subtitle image area having a 
small motion vector (mv) versus the number of macroblocks having a large motion vector, or (iv) the fact that scene changes are 
not simultaneously detected in the different image areas. The arrangement can be used for commercial break detection or keyframe 
generation. 
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FIELD OF THE INVENTION 

The invention relates to a method and arrangement for detecting subtitles in a 

video signal. 

5 BACKGROUND OF THE INVENTION 

A known method of detecting subtitles in a video signal is disclosed in 
International Patent Application WO-A 95/01051. In this prior-art method, the number of 
signal level transitions in a television line is counted. The detection is based on the insight 
that subtitles are normally light characters on a dark background. 

10 

OBJECT AND SUMMARY OF THE INVENTION 

It is an object of the invention to provide an alternative method and 
arrangement for detecting subtitles. 

To this end, the method in accordance with the invention divides each frame 
1 5 into a first image area in which subtitles are expected to be reproduced and at least one 

second image area not coinciding with said first image area, and calculates a complexity of 
the first and second image areas. An output signal is generated if the complexity of the first 
image area exceeds the complexity of the second image area by a predetermined ratio. 

Embodiments of the method and arrangement have the advantage that existing 
20 circuitry of MPEG encoders and/or decoders can be used. The processing power to detect the 
subtitles is marginal, due to the fact that most computations are already done by circuitry in 
the video encoder or decoder. 

One embodiment is based on MPEG division of frames into slices each 
encoded into a number of bits and a quantizer scale. The complexities of the first and second 
25 image areas are herein calculated by summing the products of said number of bits and 
quantizer scale over the slices constituting the respective image area. 

A further embodiment is based on the transformation of image data into 
spectral DC and AC coefficients. The complexity of the first and second image areas is 
represented by the center of gravity of the spectral coefficients. 
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Another embodiment is based on MPEG division of frames into blocks having 
motion vectors. The complexity of the first image area is represented by the number of blocks 
having a motion vector which is smaller than a predetermined first threshold, and the 
complexity of the second image area is represented by the number of blocks having a motion 
5 vector which is larger than a predetermined second threshold. 

In yet another embodiment, the motion estimation circuitry of an MPEG 
decoder to search resembling prediction blocks is used to detect scene changes. The 
complexities of the first and second image areas are herein represented by the occurrence of a 
scene change in the respective image area, and the output signal is generated if a scene 
10 change is detected in said first image area and not in said second image area. Note that, in 
this embodiment, the output signal indicates the appearance or disappearance of a subtitle 
rather than its presence. 

The detection of subtitles is useful in various kinds of video signal processing. 

The subtitle may be subjected to an OCR algorithm to provide an electronic 
15 version of the text. The electronic text may be separately stored and subsequently used, for 
example, for indexing video scenes stored in a database. A typical application thereof is 
retrieval of video scenes in a video recorder based on spoken keywords. 

A further application is the generation of key frames for retrieval or editing of 
video material. A key frame is usually one of the first frames after a scene change. The 
20 invention allows subtitled frames to be selected as key frames. 

Subtitle detection may further assist in detecting commercial breaks in 
television programs. Because commercials are rarely subtitled, the absence of subtitles for a 
certain period of time during a subtitled movie is an indication of a commercial break. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 shows schematically an MPEG encoder including a subtitle detector in 
accordance with the invention. 

Fig. 2 shows schematically an MPEG decoder including a subtitle detector in 
accordance with the invention. 
30 Fig. 4 is a flow chart of operational steps carried out by a first embodiment of 

the subtitle detector which is shown in Figs. 1 and 2. 

Fig. 5 is a flow chart of operational steps carried out by a second embodiment 
of the subtitle detector which is shown in Figs. 1 and 2. 
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Fig. 6 shows a timing diagram to illustrate the operation of the embodiment 
which is shown in Fig. 5. 

Fig. 7 is a flow chart of operational steps carried out by a third embodiment of 
the subtitle detector which is shown in Figs. 1 and 2. 
5 Figs. 8A and 8B show histograms to illustrate the operation of the embodiment 

which is shown in Fig. 7. 

Fig. 9 is a flow chart of operational steps carried out by a fourth embodiment 
of the subtitle detector which is shown in Figs. 1 and 2. 

1 0 DESCRIPTION OF EMBODIMENTS 

Fig. 1 shows schematically an MPEG encoder including an arrangement for 
detecting a subtitle in accordance with the invention. The MPEG encoder is known per se. It 
comprises a circuit 101 for dividing each input image into blocks, a subtracter 102 for 
subtracting a prediction block from each block, a Discrete Cosine Transform circuit 103 

15 which transforms each block of 8x8 image pixels into blocks of 8x8 spectral coefficients, a 
quantizer 104, a variable-length encoder 105, a buffer 106, a bit rate control circuit 107, an 
inverse quantizer 108, an inverse Discrete Cosine Transform circuit 109, an adder 110, a 
frame memory 111, a motion estimation circuit 112, and a motion compensator 113. The 
operation of the MPEG encoder is well known to the skilled person in the field of video 

20 compression and will therefore not be described in more detail. An exhaustive description 

can be found, inter alia, in the book "MPEG Video Compression Standard" by J.L. Mitchel et 
al., ISBN 0-412-08771-5, Kluwer Academic Publishers. 

Reference numeral 300 in Fig. 1 denotes the subtitle detector. Various 
embodiments thereof will be described hereinafter. The detector receives input signals 

25 produced by the MPEG encoder. The actual signal (or set of signals) being fed to the detector 
depends on the embodiment. Five input signals are shown in Fig. 1 by means of encircled 
signal names: 

- b denotes the number of bits used for encoding an image slice excluding overhead bits, 

- qs denotes the quantizer scale for a slice, 

30 - c denotes the transform coefficients (DC and AC) of a macroblock, 

- mv denotes the motion vector(s) of a macroblock, 

- mad denotes the mean absolute difference between an input image block and the 
prediction block found by the motion estimator. 
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Fig. 2 shows an MPEG decoder, comprising a variable-length decoder 201, a 
slice processor 202, a macroblock processor 203, an inverse quantizer 204, an inverse 
Discrete Cosine Transformer 205, an adder 206, a frame memory 207, and a motion 
compensator 208. Like the encoder, a further description of this MPEG decoder does not 
5 need to be given here. Reference numeral 300 again denotes the subtitle detector, which 
receives input signals from various parts of the MPEG decoder. The signals b, mv, qs and c 
in Fig. 2 are the same as in Fig. 1 . 

The operation of the subtitle detector 300 will now be described. As Fig. 3 
shows, the detector splits the display screen into a first image area 31, in which subtitles are 
10 usually displayed, and further image areas 32. The first image area 31 and further image 
areas 32 will hereinafter also be referred to as subtitle area and non-subtitle area, 
respectively. The subtitle detection algorithm is based on the significant difference between 
the complexity of the second image area, where no subtitles appear, and the complexity of 
the first image area where subtitles are displayed. 
1 5 Fig. 4 is a flow chart of operational steps carried out by a first embodiment of 

the subtitle detector 300. In this embodiment, the complexity is represented by the product of 
the number of bits b used to encode the respective image area and the quantizer scale qs. For 
the subtitle area, the complexity Ci is: 
C,=£bxqs 

Si 

20 where Si denotes the set of slices collectively forming the subtitle area. For the non-subtitle 
area, the complexity C2 is: 

C 2 =£bxqs 

s 2 

where S2 denotes the set of slices collectively forming the non-subtitle area. In order to take 
the different sizes of the two areas into account, the complexities Ci and C2 can be 

25 normalized by dividing them by the number of macroblocks the areas cover. The 
complexities Ci and C2 are calculated in a step 41. 

In a step 42, the ratio R m F=C 2 /Ci for the current frame m is computed. R m is 
low when a subtitle is present in the frame. If no subtitle is present, the complexities of the 
two areas are comparable and therefore match. The structure of a subtitle (usually white 

30 fonts, surrounded by a small black line), and the additional fact that it is overlaid in the 
original frame, causes the complexity values of the subtitle area to rise significantly. The 
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ratio R m will therefore decrease. The lower the ratio, the bigger and more complex the 
subtitle. 

A two-hour examination of available subtitled material revealed that the 
minimum duration of a subtitle in a movie is two seconds. The detector calculates the ratio 
5 R m for each I frame produced within said time period. 

In a subsequent step 43, the ratios R m are summed up. In a step 44, SR m is 
compared with a threshold Thr. A subtitle is said to be present, and an appropriate output 
signal is generated in a step 45, if SR m is lower than said threshold. The threshold Thr is 
chosen empirically from examination of available subtitled movie material. 

10 Fig. 5 is a flow chart of operational steps carried out by a second embodiment 

of the subtitle detector 300. In this embodiment, the complexity is represented by the 
occurrence of a scene change in the respective image areas 31 and 32. To this end, the 
subtitle detector receives the mean absolute distortion (mad) of a current macroblock from 
the MPEG encoder. The mean absolute distortion MAD is a criterion used by the encoder to 

15 locate, in the frame memory 111 (see Fig. 1), an image block which most resembles the 
current input block, and selects said block to be used as prediction block for predictive 
encoding. In a first step 51, the detector 300 computes the sum EMADi of the mean absolute 
distortions in the subtitle area for the actual frame, and the sum EMAD 2 of the mean absolute 
distortions in the non-subtitle area. In a step 52, the detector computes the average values 

20 AvMADi and AvMAD 2 for all the frames (I, P and B) inside a first given timing window ti 
around the actual frame, excluding frames which are inside a smaller timing window t 2 
around the frame (see Fig. 6). In a step 53, the sum IMADi of the actual frame is compared 
with the average value AvMADi of the frames within the timing window. If the sum SMADx 
is substantially higher than the average value AvMADi, the sum SMADi is a local peak 

25 value. In that case, a scene change has been detected in the subtitle area. In a similar manner, 
the sum 2MAD 2 is compared with the average value AvMAD 2 in a step 54. If SMAD 2 is 
substantially higher than AvMAD 2 , the sum SMAD 2 is a local peak value and a scene change 
has been detected in the non-subtitle area. If a scene change has been detected in the subtitle 
area but not in the non-subtitle area, the actual frame is indexed as that of a subtitle 

30 appearance or disappearance. An output signal is then generated in a step 55. 

Fig. 7 is a flow chart of operational steps carried out by a third embodiment of 
the subtitle detector 300. In this embodiment, the complexity is represented by the 'center of 
gravity* of the DCT coefficients c produced by the encoder or received by the decoder. In a 
step 71, a histogram of the DCT coefficients ci(0)..ci(63) of the blocks forming the subtitle 
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area is computed. Advantageously, this is done for I frames only. In a step 72, the same 
histogram is computed for the DCT coefficients c 2 (0)..c 2 (63) of the blocks forming the non- 
subtitle area. In a step 73, the respective centers of gravity m and n 2 are computed. The center 
of gravity is the index n of the DCT coefficient for which: 



5 Z c ( i )=E c < i ) 

i=0 i=n+l 

This is illustrated in Figs. 8 A and 8B, where Fig. 8A shows a histogram which 
is typical of image areas without a subtitle, and Fig. 8B shows a histogram which is typical of 
image areas with a subtitle. This is caused by the fact that subtitles are usually white with a 
small black border so that the blocks covering subtitles contain a larger number of high AC 
10 coefficients. 

In a step 74, the centers of gravity ni and n 2 are compared. If the center ni 
corresponds to a substantially higher spatial frequency than the center n 2 , the actual I frame is 
detected to be a subtitle frame. In that case, an output signal is generated in a step 75. 

Fig. 9 is a flow chart of operational steps carried out by a third embodiment of 

1 5 the subtitle detector 300. In this embodiment, the complexity is derived from the motion 
vectors mv produced by the encoder or received by the receiver. In a step 91, it is checked 
whether the motion vectors mvi of inter-macroblocks forming the subtitle area are smaller 
than a given value Mi. In that case, a counter ni is incremented in a step 92. In a step 93, it is 
checked whether the motion vectors mv 2 of the inter-macroblocks forming the non-subtitle 

20 area are larger than a given value M 2 . In that case, a counter n 2 is incremented in a step 94. 

In a step 95, the detector checks whether the average number ni/Ni of small 
motion vectors in the subtitle area exceeds the average number n 2 /N 2 of large motion vectors 
in the non-subtitle area, where Ni and N 2 are the total number of macroblocks in the subtitle 
area and non-subtitle area, respectively. If that is the case, a subtitle is said to be present, and 

25 an appropriate output signal is produced in a step 96. This embodiment exploits the insight 
that subtitles are static so that the motion vectors in the subtitle area are generally small. This 
is illustrated in Fig. 10, where numerals 98 and 99 denote macroblocks having large motion 
vectors and macroblocks having small (approximately zero) motion vectors, respectively. 

A subtitle can also be detected by determining, for each (8x8) block of an 

30 image, whether such block is likely a "text block", and subsequently identifying a subtitle as 
an area which accommodates a significant number of contiguous text blocks. A possible text 
block detection algorithm includes calculating the absolute sum of a given set of AC 
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coefficients, and checking said absolute sum against a threshold Thr. In mathematical 



notation: 



1 if 



X|AC X9y (iJ)|>Thr 



TB(x,y)H 



0 if 



2|AC x>y (iJ)|<Thr 



where x,y denotes the position of a block within an image, i,j denotes the position of AC 
5 coefficients within the block, and I, J denotes the coefficient positions that are taken into 
account for text detection (for example, the first nine AC coefficients of a zigzag scan). 



matrix containing 1 's for possible text blocks and 0's otherwise. The text block matrix will 
generally include a significant number of l's in the subtitle area. The matrix will also include 

10 isolated text blocks elsewhere in the image due to sharp luminance edges, and isolated non- 
text blocks in the subtitle area due to misdetection or spaces between words of the subtitle. 
Therefore, filtering is applied to the result of the text block detection. A first filter is used to 
remove isolated text blocks. A second filter is used to close the gaps between text blocks. It 
has been found that the sequence remove-close-remove-close (two iterative filter operations) 

15 are adequate. More iterations do not improve the result significantly. The filter size may be 
adjusted to the font size that is used by the respective image provider and may therefore vary 
from country to country or from broadcasting station to broadcasting station. 



by taking known geometric properties into account such as aspect ratio (subtitles are usually 
20 stretched horizontally) and position (lower third of the screen). Also temporal properties 
(subtitles are static for a certain period of time) may be taken into account by such post- 
processing algorithm. 



disappearance of subtitles in a video signal are disclosed. A very high reliability can be 
25 achieved, and a marginal processing power is needed, due to the fact that most computations 
are already done by circuitry of an MPEG encoder (101-1 13) or decoder. A subtitle is 
detected if the complexity of the image area in which subtitles are displayed substantially 
exceeds the complexity of at least one other image area. Examples of properties representing 
the complexity are (i) the products of bit cost (b) and quantizer scale (qs) in MPEG slices, (ii) 
30 the location of the center of gravity of the spectral DCT coefficients (c), (iii) the number of 
macroblocks in the subtitle image area having a small motion vector (mv) versus the number 



The text block detection values TB(x,y) thus obtained collectively constitute a 



The subtitle localization using the text block matrix can further be improved 



A method and an arrangement (300) for detecting the presence, appearance or 
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of macroblocks having a large motion vector, or (iv) the fact that scene changes are not 
simultaneously detected in the different image areas. 

The arrangement can be used for commercial break detection or keyframe 

generation. 
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CLAIMS: 



1 . A method of detecting subtitles in a video signal, the method comprising the 
steps of: 

- dividing each frame into a first image area in which subtitles are expected to be 
reproduced and at least one second image area not coinciding with said first image area; 

5 - calculating a complexity of the first and second image areas; 

- generating an output signal if the complexity of the first image area exceeds the 
complexity of the second image area by a predetermined ratio. 

2. A method as claimed in claim 1 , wherein the first and second image areas are 
10 divided into slices each encoded into a number of bits and a quantizer scale, the complexity 

of the first and second image areas being calculated by summing the products of said number 
of bits and quantizer scale over the slices constituting the respective image area. 

3. A method as claimed in claim 1, wherein the image data in each image area 
15 are transformed into spectral coefficients, the method further comprising the step of 

calculating the center of gravity of the spectral coefficients of the respective image area, the 
complexity of the first and second image areas being represented by the spectral location of 
the respective center of gravity. 

20 4. A method as claimed in claim 1, wherein the first and second image areas are 

divided into blocks having motion vectors, the complexity of the first image area being 
represented by the number of blocks having a motion vector which is smaller than a 
predetermined first threshold, and the complexity of the second image area being represented 
by the number of blocks having a motion vector which is larger than a predetermined second 

25 threshold. 

5. A method as claimed in claim 1 , further comprising the steps of detecting a 

scene change in said first and second image areas, wherein the complexity of the first and 
second image area is represented by the occurrence of a scene change in the respective image 
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area, and the output signal is generated if a scene change is detected in said first image area 
and not in said second image area. 

6. An arrangement for detecting subtitles in a video signal, the arrangement 

comprising: 

- means for dividing each frame into a first image area in which subtitles are expected to be 
reproduced and at least one second image area not coinciding with said first image area; 

- means for calculating a complexity of the first and second image areas; 

- means for generating an output signal if the complexity of the first image area exceeds 
the complexity of the second image area by a predetermined ratio. 
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